FRESCO: the French telephone speech data collection - part of the european Speechdat(m) project
نویسندگان
چکیده
This paper describes the design, collection and postprocessing of the French SpeechDat corpus FRESCO. Being a database of approximately 35,000 utterances recorded from 1000 callers over the terrestrial telephone network in France, it comprises immediately usable and relevant speech for the initial training and assessment of speaker-independent phoneme-model or wordmodel based speech recognizers, as they are employed in automated telephone services. FRESCO is one of the 1000speaker telephone speech databases produced as "case studies" within the European project SpeechDat(M).
منابع مشابه
SpeechDat Cymru: A Large-scale Welsh Telephony Database
We describe the collection of SpeechDat Cymru, a 2000-speaker speech recognition database for the Welsh language, recorded over the public switched telephone network (PSTN). It is collected as part of SpeechDat(II), an ELRA project which deals with the creation of databases in over 20 different European languages and dialects. Design issues common to all SpeechDat(II) databases are discussed, i...
متن کاملDevelopment of New Telephone Speech Databases for French: the NEOLOGOS Project
The NEOLOGOS project is a speech databases creation project for the French language, resulting from a collaboration between French universities and industrial companies, and supported by the French Ministry for Research. The goal of NEOLOGOS is to create new kinds of speech databases: firstly, a 1000 speakers telephone database of children’s voices, called PAIDIALOGOS, following the SpeechDat g...
متن کاملDevelopment of the estonian speechdat-like database
A new database project has been launched in Estonia last year. It aims the collection of telephone speech from a large number of speakers for speech and speaker recognition purposes. Up to 2000 speakers are expected to participate in recordings. SpeechDat databases, especially Finnish SpeechDat, have been chosen as a prototype for the Estonian database. It means that principles of corpus design...
متن کاملSpeechdat-e: five eastern european speech databases for voice-operated teleservices completed
In the Speechdat-E project five medium large telephone speech databases have been collected for Czech, Hungarian, Polish, Russian, and Slovak. The project was recently concluded. This paper reports briefly on the contents of the databases, elaborates on experiences gained from the data recordings and from the validation of the databases. The availability of the databases to the public is addres...
متن کاملFirst experiences of the German speechdat-car database collection in mobile environments
In SpeechDat-Car, speech databases for speech driven devices and services for mobile environments are collected for nine European languages. The German SpeechDat-Car installation was the first fully equipped platform within the project. It has served as a testbed for the recording software for the entire project, and as an opportunity to perform technical and organizational feasibility tests fo...
متن کامل